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Abstract 

Background: During evolution, plants and other organisms have developed a diversity of chemical defences, 
leading to the evolution of various groups of specialized metabolites selected for their endogenous biological 
function. A correlation between phylogeny and biosynthetic pathways could offer a predictive approach enabling 
more efficient selection of plants for the development of traditional medicine and lead discovery. However, this 
relationship has rarely been rigorously tested and the potential predictive power is consequently unknown. 

Results: We produced a phylogenetic hypothesis for the medicinally important plant subfamily Amaryllidoideae 
(Amaryllidaceae) based on parsimony and Bayesian analysis of nuclear, plastid, and mitochondrial DMA sequences 
of over 100 species. We tested if alkaloid diversity and activity in bioassays related to the central nervous system are 
significantly correlated with phylogeny and found evidence for a significant phylogenetic signal in these traits, 
although the effect is not strong. 

Conclusions: Several genera are non-monophyletic emphasizing the importance of using phylogeny for 
interpretation of character distribution. Alkaloid diversity and in vitro inhibition of acetylcholinesterase (AChE) and 
binding to the serotonin reuptake transporter (SERT) are significantly correlated with phylogeny. This has 
implications for the use of phylogenies to interpret chemical evolution and biosynthetic pathways, to select 
candidate taxa for lead discovery, and to make recommendations for policies regarding traditional use and 
conservation priorities. 
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Background 

During evolution, plants and other organisms have devel- 
oped a diversity of chemical defence lines, leading to the 
evolution of various groups of specialized metabolites 
such as alkaloids, terpenoids, and phenolics, selected for 
their endogenous biological function [1-7]. Intuitively, a 
correlation between phylogeny and biosynthetic pathways 
is sometimes assumed [1,8-10] and could offer a predictive 
approach enabling deduction of biosynthetic pathways 
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[6,11-15], defence against herbivores [16,17], more effi- 
cient selection of plants for the development of traditional 
medicine and lead discovery [18-22] as well as inform con- 
servation priorities [23]. 

Several studies have confirmed the usefulness of specia- 
lized metabolites such as glucosinolates, iridoids, sesqui- 
terpene lactones, flavonoids, and phenolics to support 
molecular based phylogenies contradicting morphologic 
patterns [11,12,24-29]. On the contrary, several studies 
have found inconsistency of specialized metabolite profiles 
at various taxonomic levels and indicated that specialized 
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chemistry and anti-herbivore defence syndromes tend to 
be poorly correlated with plant phylogeny [6,7,13,30]. 

Lack of congruence between specialized chemistry and 
phylogeny may be caused by several different phenom- 
ena. One contributing factor is convergent evolution by 
which the same or similar traits originate independently 
in taxa that are not necessarily closely related, often in 
response to similar environmental challenges [17,31]. A 
striking example of convergent evolution is the common 
use of the sex pheromone (Z)-7-dodecen-l-yl acetate by 
over 120 species of primarily Lepidopteran insects and 
female Asian elephants, Elephas maximus [32]. In rela- 
tion to plants, well known convergent morphological 
adaptations are the occurrence of prickles, thorns, and 
spines, which have evolved to avoid or limit herbivory 
[33], succulence as adaptation to dry environments in 
both North American Cactaceae and African Euphorbia 
[34,35], and insectivorous plants, which have evolved 
several times in response to a nitrogen-deficient envir- 
onment [36]. Likewise, chemical defence lines may also 
have arisen independently in unrelated taxa, and conver- 
gent evolution in plant specialized metabolism appears 
to be surprisingly common [6,17,31,37]. For example, 
the ability to produce cyanogenic glycosides appears to 
have evolved independently in many different plant fam- 
ilies [17,31]. 

However, convergent evolution can be difficult to ver- 
ify because absence of evidence is not evidence of ab- 
sence and it is possible that some compounds presently 
considered to be limited to some lineages are indeed 
more universally found in plants [31]. Specialized com- 
pounds are not continuously expressed, but may be pro- 
duced as a response to herbivory or other damage, the 
expression may also be dependent on the environment 
[38] and plants often use a combination of several defen- 
sive traits [7,17]. In addition, chemo systematic data are 
scattered in the literature and negative results are often 
not reported. Absence or presence of a compound is also 
dependent on the amount of plant material investigated 
as well as the detection limit of the analytical methods 
[27,39]. Finally, the existence of several different phyto- 
chemical methods can cause inconsistence in the results 
reported in the literature. 

Nevertheless, reports of incongruence between phyto- 
chemistry and phylogeny have questioned the degree of 
correlation between phylogeny and specialized metabo- 
lites, indicating that such a correlation cannot simply be 
assumed [6]. However, this relationship has rarely been 
tested because of lack of accurate estimates of phylogeny 
and corresponding chemical data; lack of tradition for 
interdisciplinary studies bridging botany, chemistry, and 
molecular systematics; and appropriate statistical tools. 
Consequently, the potential predictive power is un- 
known [17]. In the present study, we use Amaryllidaceae 



subfamily Amaryllidoideae as a model system for testing 
the correlation between phylogenetic and chemical di- 
versity and biological activity. 

Amaryllidaceae subfamily Amaryllidoideae sensu APG III 
[40] (formerly recognized as a separate family, Amaryllida- 
ceae J.St.-Hil.) is a widely distributed subfamily of 59 genera 
and about 850 species. Amaryllidoideae has centres of di- 
versity in South Africa, South America, particularly in the 
Andean region, and in the Mediterranean, three of the 
recognized hotspots of biodiversity on Earth [41,42] (www. 
biodiversityhotspots.org). Plants of the Amaryllidoideae are 
used in traditional medicine to treat mental problems, pri- 
marily in Southern Africa [43,44]. The traditional use of 
plants of Amaryllidoideae has been related to their unique 
and subfamily specific alkaloid chemistry (Figure 1). Over 
500 alkaloids have been described from various species and 
have been subdivided into 18 major types based on hypo- 
thetical biosynthetic pathways [45-47]. Extracts or isolated 
alkaloids of Amaryllidaceae species have shown activity 
in vitro in a range of assays related to disorders of the cen- 
tral nervous system, primarily Alzheimers disease (inhib- 
ition of acetylcholinesterase, AChE) [21,48-51], and anxiety 
and depression (affinity to the serotonin re-uptake trans- 
porter, SERT) [21,52,53]. Galanthamine is registered in a 
number of countries as an AChE inhibitor (Reminyl or 
Rezadyne; Janssen Pharmaceutica) [54]. Another Amarylli- 
daceous alkaloid, sanguinine (9-O-demethylgalanthamine), 
is shown to be a ten times more potent inhibitor of AChE 
than galanthamine in vitro [55]. 

Amaryllidaceae subfamily Amaryllidoideae is therefore 
an ideal model system for comparing phylogenetic and 
chemical diversity with bioactivity. Previous molecular 
phylogenetic studies based on plastid gene regions {rbcL, 
trnLF, and ndhF) have confirmed Amaryllidoideae as 
monophyletic and resolved many taxa into geographic- 
ally confined monophyletic groups [42,56]. The African 
tribe Amaryllideae has been well supported as sister 
group to the remaining taxa. However, the relationship 
among several other early diverging lineages, in particu- 
lar the African tribes Haemantheae and Cyrtantheae, 
and the Australasian Calostemmateae are not well sup- 
ported by previous studies and remain problematic [42]. 
In a study by Meerow and Snijman [42] based on parsi- 
mony analysis of plastid ndhF sequences, Amaryllideae 
also resolved as sister to the remainder of the subfamily. 
The next major split resolved a clade with American and 
Eurasian subclades, and an African/ Australasian clade 
with Cyrtantheae as sister to a Haemantheae/Calostem- 
mateae clade. However, this African/Australasian clade 
was not supported by bootstrap analysis. 

The objectives of the present study were: (1) to pro- 
duce a comprehensive and well supported phylogenetic 
hypothesis of Amaryllidaceae subfamily Amaryllidoideae 
based on total evidence from DNA regions from all 
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Figure 1 Alkaloid types recovered in the present study. Alkaloids were classified to type according to the hypothetical biosynthetic pathways 
proposed by Jin [45,46]. Marvin was used for drawing and displaying chemical structures, Marvin 5.9.0, 2012, ChemAxon (http://www.chemaxon. 
com). 



three plant genomes; (2) to test for correlation between 
phylogenetic and chemical diversity and central nervous 
system (CNS) related activities. 

Results 

Phylogeny 

The ITS region was the most informative region fol- 
lowed by the matK region (Table 1). The trnLF and the 
nadl regions resolved 10% or less of the clades with 
strong support defined as > 90% bootstrap support. 



There were no strongly supported conflicts among any 
regions (Bayesian consensus tree with posterior prob- 
abilities and parsimony bootstrap consensus tree for the 
total evidence analysis and bootstrap consensus trees of 
individual regions are provided as Additional file 1: 
Figures S1-S7). The number of resolved clades (87%; 
Table 1) and the number of resolved clades supported 
by > 90% bootstrap (62%) was highest in the total evi- 
dence analysis, which was also the only analysis that 
resolved all the major lineages. The Bayesian analysis of 



Table 1 Details of the matrices included in this study 



Matrix 


# taxa 


# aligned 
characters 


# of PPI^ 

(%) 


# MP 
trees 


Length of 
MP trees 


CP 




Percent clades 
with > 50% BS"^ 


Percent clades 
with > 90% BS"^ 


ITS 


105 


953 


502 (53) 


6520 


2537 


0.42 


0.82 


78 


55 


trnLF 


106 


1163 


185 (16) 


470 


601 


0.70 


0.86 


33 


10 


matK 


105 


2019 


295 (15) 


9940 


922 


0.74 


0.90 


67 


39 


Plastid combined 


107 


3182 


480 (15) 


3620 


1544 


0.71 


0.88 


76 


45 


nadl 


104 


1726 


53 (3) 


8330 


275 


0.79 


0.97 


28 


7 


Total evidence 


109 


5861 


1086 (19) 


554 


4454 


0.53 


0.85 


87 


62 



VPI: potentially parsimony informative characters. ^Cl: Consistency index. ^Rl: Retention index. "^Percent of resolved clades in the Bootstrap consensus tree 
with > 50% BS (bootstrap support) and with > 90% BS are proportions of the possible number of clades (# taxa -1). 
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the total evidence matrix provided the same overall 
topology as parsimony analysis and all major clades 
were strongly supported by Bayesian analysis (Figure 2; 
Additional file 1: Figures SI and S2). 

The topology (Figure 2) of the total evidence analysis 
largely supports previous studies [42,56]. The African 
tribe Amaryllideae (100% BS; PP = 1.00) is sister to the re- 
mainder of the Amaryllidoideae (100% BS; PP = 1.00), 
and this is strongly supported by all analyses. The next 
major split resolves an American clade (66% BS; PP = 
0.99) and a Eurasian clade (65% BS; PP = 1.00) as sisters 
(100% BS; PP = 1.00) and a clade (66% BS; PP = 1.00) with 
the African monogeneric tribe Cyrtantheae (100% BS; 
PP = 1.00), and tribe Haemantheae (100% BS; PP = 1.00) 
as sisters (100% BS; PP = 1.00), and the Australasian tribe 
Calostemmateae (100% BS; PP = 1.00) as sister to these. 

In the ITS analysis (Additional file 1: Figure S3), tribe 
Amaryllideae (100% BS) is sister to the remainder of 
Amaryllidoideae (61% BS). Within the remainder of 
Amaryllidoideae, tribe Calostemmateae (100% BS) is sis- 
ter to a clade (54% BS) including tribes Cyrtantheae 
(100% BS), Haemantheae 95% BS) and the American and 
European Amaryllidoideae. Tribes Cyrtantheae and Hae- 
mantheae are sisters (96% BS). In the combined plastid 
analysis (Additional file 1: Figure S6), Cyrtantheae is sis- 
ter to the remainder of Amaryllidoideae except tribe Amar- 
yllideae (75% BS). In both the matK (Additional file 1: 
Figure S4 supporting online material) and the combined 
plastid analysis tribe Calostemmateae is sister to tribe Hae- 
mantheae {matK: 70% BS; plastid: 94% BS). 

The low bootstrap support (65%) for the Eurasian clade 
in the total evidence analysis (Figure 2, Additional file 1: 
Figure S2) may be caused by uncertainty in the place- 
ment of the genus Lycoris, The remainder of the Eurasian 
clade is strongly supported in all analyses except trnLF 
and nadly which are the two regions providing the least 
resolution and support in general (Additional file 1: 
Figures S3-S7 in the supporting online material). 

Relationship of phylogeny to chemistry and bioactivity 

The relationship of individual types of compounds and 
biological activity could be assessed using the D metric 
developed to deal with discrete binary characters [57]. D 
is equal to 1 if the observed chemical component has a 
random distribution (i.e. no phylogenetic signal). D is 
equal to 0 if the component is distributed exactly as 
would be predicted under a Brownian motion model of 
gradual divergent evolution (i.e. strong phylogenetic sig- 
nal). See Materials and Methods for details. Of the seven 
alkaloid types which are found in more than one species 
we found evidence for some phylogenetic signal of five 
types (Table 2a) [57]. With the exception of tazettine 
and galanthindole, D values were significantly different 
from that expected under a random distribution of the 



components across the phylogenetic tree {D=l), Inhib- 
ition of acetylcholinesterase (AChE) and binding to the 
serotonin re-uptake transporter (SERT) were used as 
proxies for biological activity. Both measures of bioactiv- 
ity, AChE activity and SERT activity also displayed sig- 
nificant phylogenetic signal (Table 2b). However, in none 
of the above cases was the phylogenetic signal suffi- 
ciently strong to be considered as indistinguishable from 
a Brownian motion model of evolution, where traits are 
strongly clumped' on the phylogeny (and D = 0) [57]. 

There was a statistically highly significant correlation be- 
tween differences in chemical profile and phylogenetic dis- 
tance, indicating that closely-related species tend to have 
more similar chemical profiles than more distantly-related 
species, although the effect was not strong (Mantel test: 
r = 0.085, p = 0.002). There was also statistically significant 
correlation between chemical profile and phylogenetic dis- 
tance in the genus level comparison (Mantel test: 
r = 0.090, p = 0.024), although the effect was also weak. 

Discussion 

Phylogeny of Amaryllidoideae 

For the purpose of the present study, we consider the 
total evidence approach to provide the best estimate of 
phylogeny and all major lineages are supported by both 
parsimony and Bayesian analyses. The present study has 
doubled previous sampling of Amaryllidoideae from 51 
species [56] to 108 and from combined analysis of two 
DNA regions [42,56] to four DNA regions representing 
all three genomes. The only previous study resolving 
relationships among basal lineages was based on plastid 
ndhF sequences [42] and resolved Calostemmateae and 
Haemantheae as sisters and tribe Cyrtantheae as sister 
to these. However, in the present study (Figure 2), the 
African tribes Cyrtantheae and Haemantheae are 
strongly supported by both bootstrap and Bayesian pos- 
terior probabilities as sister clades. Tribe Calostemma- 
teae is sister to these, although this was only weakly 
supported by the bootstrap, but strongly supported by 
Bayesian posterior probabilities. A sister group rela- 
tionship of the two African tribes Cyrtantheae and 
Haemantheae and the Australasian tribe Calostemma- 
teae as sister to these appears more convincing than the 
alternative based on biogeography. However, in terms of 
morphology there may be some room to question this 
relationship. The indehiscent capsule of Calostemmateae 
has more in common with the indehiscent baccate fruit 
of Haemantheae (resembling the unripe fruit of Clivia, 
Scadoxus, Haemanthus, and Cryptostephanus) than with 
the dehiscent capsule of Cyrtanthus, 

Phylogenetic signal of chemical diversity and bioactivity 

Our approach to quantify overall correlation between 
chemical and phylogenetic diversity has previously been 
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Agapanthus campanulatus 
Amaryllis belladonna 
Ammochahs coranica 
Crinum bulbispermum 
Crinum stuhlmannii 
Crinum lugardiae 
Crinum moorei 
Crinumxpowellii 
Brunsvigia bosmaniae 
Brunsvigia orientalis 
Brunsvigia radulosa 
Strumaria truncata 
Hessea pilosula 
Hessea speciosa 
Nerine huttoniae 
Nerine laticoma 
Nerine masonorum 
Strumaria discifera 
Strumaria salteri 
Nerine bowdenii 
Nerine undulata 
Nerine humilis 
Nerine platypetala 
Crossyne flava 
Boophone disticha 
Cyrtanthus contractus 
Cyrtanthus herrei 
Cyrtanthus mackenii 
Cyrtanthus suaveolens 
Clivia miniata 
Clivia nobilis 

Cryptostephanus vansonii 
Gethyllis afra 
Gethyllis grandiflora 
Haemanthus albiflos 
Haemanthus deformis 
Haemanthus humilis 
Haemanthus amarylloides 
Haemanthus sanguineus 
Haemanthus coccineus 
Haemanthus montanus 
Scadoxus multiflorus 
Scadoxus puniceus 
Calostemma purpureum 
Proiphys amboinensis 
Lycoris radiata 
Lycoris sprengeri 
Pancratium illyricum 
Sternbergia Candida 
Narcissus assoanus 
Narcissus asturiensis 
Narcissus pseudonarcissus 
Narcissus bulbocodium 
Narcissus jonquilla 
Narcissus calcicola 
Narcissus rupicola 
Narcissus papyraceus 
Acis autumnalis 
Acis valentina 
Galanthus elwesii 
Galanthus fosteri 
Galanthus lagodechianus 
Galanthus transcaucasicus 
Galanthus rizehensis 
Galanthus nivalis 
Galanthus reginaeolgae 
Galanthus plicatus 
Leucojum vernum 
Leucojum aestivum 
Hannonia hesperidum 
Lapiedra martinezii 
Eithea blumenavia 
Rhodophiala araucana 
Rhodophiala chilensis 
Zephyranthes Candida 
Zephyranthes minima 
Habranthus magnoi 
Zephyranthes lindleyana 
Habranthus martinezii 
Habranthus tubispathus 
Habranthus robustus 
Sprekelia fomosissima ' 
Sprekelia howardii 
Hippeastrum aulicum 
Hippeastrum mandonii 
Hippeastrum papilio 
Hippeastrum puniceum 
Hippeastrum striatum 
Hippeastrum vittatum 
Rhodophiala bifida 
Eucharis amazonica 
Stenomesson leucanthum 
Stenomesson pearcei 
Eucrosia aurantiaca 
Eucrosia mirabilis 
Phaedranassa dubia 
Phaedranassa viridiflora 
Phaedranassa tunguraguae 
Ismenexdeflexa 
Hymenocallis latifolia 
Hymenocallis littoralis 
Hymenocallis maximiliani 
Hymenocallis rotata 
Clinanthus incarnatus 
Clinanthus variegatus 
Rauhia multiflora 
Rauhia staminosa 
Chlidanthus fragrans 
Eustephia darwinii 



Amaryllideae 





Nerine I 



Boophone 



Cyrtantheae 



Haemantheae 




Phaedranassa 



Sprel<elia 



Figure 2 Phylogenetic hypothesis for Amaryllidaceae subfamily Amaryllidoideae. Obtained after 1,000,000 replicates of Bayesian inference. 
Parsimony bootstrap percentages and Bayesian posterior probabilities (BS/PP) are indicated for major clades only. Examples of members are 
illustrated on the right hand side. 
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applied to show positive correlations between pheromone 
differences and nucleotide divergence in Bactrocera fruit 
flies [58] and phylogenetic correlation of cuticular hydro- 
carbon diversity in ants [59]. We have now shown the po- 
tential application of this approach to explore correlations 
between phylogenetic and chemical diversity of medicinal 
plants. 

We found significant phylogenetic signal for five out 
of seven tested individual alkaloid types and for both 
AChE and SERT bioactivity proxies, although the signal 
was not strong enough to be indistinguishable from a 
Brownian model of evolution, where traits are strongly 
clumped on the phylogeny [57]. There was also a highly 
significant correlation between differences in chemical 
profiles and phylogenetic distance in both species and 
generic level analyses. Despite this, there was often still 
considerable difference in chemical make-up even be- 
tween phylogenetically very close species (Figure 3) with 
the average number of alkaloid types differing between 
congeners being 2.26. Whereas members of some genera 
such as Crinum and Galanthus have generally similar 
chemical profiles, other genera such as Hippeastrum and 
Narcissus have striking diversity. 

An explanation for the moderate correlation found 
could be either methodological artefacts or underlying 
ecological or genetic differences [38]. We minimized 
methodological artefacts by using the same plant acces- 
sions for both phylogenetic, chemical, and bioactivity 
studies, and by analysing our data with consistent meth- 
ods. Chemical profiles were based on types deducted from 



Table 2 Phylogenetic signal in chemistry and biological 
activity determined using Fritz and Purvis's [57] D metric 
(see Materials and Methods for details) 



a) Chemical components 


Alkaloid group 


D 


P(D = 1) 


P (D = 


0) 


Crinine 


0.6768 


0.021 


0 




Galanthamine 


0.549 


0 


0.009 




Lycorine 


0.77 


0.018 


0 




Galanthindole 


1.091 


0.621 


0.011 




Homolycorine 


0.769 


0.021 


0 




Montanine 


0.572 


0.002 


0.01 




Tazettine 


0.852 


0.094 


0 




b) Biological activity 


Measure 


D 


P(D = 1) 


P (D = 


0) 


AChE 


0.679 


0.004 


0.001 




SERT 


0.634 


0.037 


0.044 





D is equal to 1 if the observed chemical component has a random distribution 
(i.e. no phylogenetic signal). D is equal to 0 if the component is distributed 
exactly as would be predicted under a Brownian motion model of gradual 
divergent evolution (i.e. strong phylogenetic signal). P values represent the 
probability that the observed D value is equal to 1 or 0, respectively (P >0.05 
indicates that the observed value is not significantly different from these 
values). 



hypothetical pathways and could be an oversimplification 
of the chemical diversity contained by over 500 individual 
alkaloid structures known from the subfamily [45,47]. 

The strength of correlation could be dependent on taxo- 
nomic scale. Whereas alkaloids derived from norbelladine 
and its derivatives are almost exclusively restricted to the 
subfamily Amaryllidoideae [45], and alkaloids with AChE 
activity appear to be phylogenetically constrained within 
Narcissus [18], the considerable variation at the species 
and genus level found in this study corresponds well with 
within species variation of alkaloid profiles in for example 
Galanthus [51,60]. 

Evaluation of extensive historical drug data, marine nat- 
ural products, medicinal plants and bioactive natural pro- 
ducts suggests that drugs are derived mostly from pre- 
existing drug-productive families that tend to be clustered 
rather than randomly scattered in the phylogenetic tree of 
life [61]. Zhu et al [61] further suggest that efforts to iden- 
tify new potential drugs can therefore be concentrated on 
exploring a number of drug-productive clusters. However, 
based on our results, such a strong presumed correlation 
between phylogeny and bioactivity appears to be an over- 
simplification, at least at the taxonomic scale tested in our 
study. Based on our data for the medicinally important 
plant subfamily Amaryllidoideae, it appears that phylogeny 
can predict chemical diversity and bioactivity, but consid- 
erable caution must be emphasized. We also suggest that 
phylogenetic correlation of chemical traits of interest may 
need to be assessed for a particular phylogenetic frame- 
work before it is used for prediction of occurrence in un- 
investigated taxa. 

Application of phylogenetic prediction and in silico data 
mining 

A predictive approach could enable deduction of biosyn- 
thetic pathways, defence against herbivores, more efficient 
selection of plants for the development of traditional medi- 
cine and lead discovery as well as inform conservation pri- 
orities as outlined in the introduction. A plethora of data 
on phylogenetic relationships, chemical constituents and 
bioactivity are available through public databases (e.g., Gen- 
Bank) and in the literature. Systematic in silico data mining 
could enable more efficient use of predictive approaches to 
speed up all of the above applications [22,61-65]. 

However, a methodological framework still needs to be 
developed. In the present study, we have suggested an ap- 
proach for testing correlations between phylogenetic and 
chemical diversity and biological activity using experimen- 
tal data generated for this purpose. One method for subse- 
quent identification of specific nodes in phylogenies with 
high bio-screening potential has been proposed by Saslis- 
Lagoudakis et al [22] using tools from community ecol- 
ogy. In this approach, a matrix of ethnomedicinal use was 
composed and used to identify nodes in a phylogeny of 
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Figure 3 (See legend on next page.) 
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(See figure on previous page.) 

Figure 3 Relationship between phylogenetic distance and chemical diversity (number of alkaloid group differences) for 
Amaryllidaceae subfamily Amaryllidoidae. A) Scatter plot sliowing distribution of points and best-fit regression line. The size of each point is 
proportional to the log-transformed number of data points contributing to it. B) The same data showing mean (±s.e.) number of differences for 
categories of phylogenetic distance (number of substitutions per site) for ease of visualization. The overall relationship is significant. There is a 
statistically highly significant correlation between differences in chemical profile and phylogenetic distance, although the effect is not strong 
(Mantel test: r = 0.085, p = 0.002), indicating that closely related species tend to have more similar chemical profiles than more distantly related 
species. 

V ) 



Pterocarpus (Fabaceae), which have more medicinal taxa 
related to a specific category of use than expected by 
chance. This approach could be useful for identif)^ing al- 
ternative resources or substitute taxa in cases where sup- 
ply of a medicinal plant or natural product of interest is 
limited or where species in use are subject to conservation 
concerns [22]. However, for the purpose of increasing the 
chance of making truly new discoveries such as new com- 
pounds and/or new activity profiles, it may be more rele- 
vant to identify clades that possess activity of interest and 
at the same time do not correspond to well known com- 
pounds with well known activity profiles [66]. 

Other methods for predictive in silico data mining may 
be combined with a phylogenetic selection approach, e.g., 
exploration of natural product chemical space as devel- 
oped by Backlund and co-workers [65,67,68]. Another 
computerized geospatial tracking tool linking bioactive 
and phylogenetic diversity has been developed for micro- 
organisms [63]. The concept of virtual parallel screening 
developed for natural products by Rollinger [64], which 
simultaneously enables fast identification of potential tar- 
gets, insight into a putative molecular mechanism and es- 
timation of a bioactivity profile, could allow for optimal 
selection of relevant targets. 

Conclusion 

In conclusion, we have shown significant correlation be- 
tween phylogenetic and chemical diversity and biological 
activity in the medicinally important plant subfamily 
Amaryllidoideae. However, a correlation cannot be 
assumed for other study systems without considerable 
caution or testing. This has implications for the use of 
phylogenies to interpret chemical evolution and biosyn- 
thetic pathways, to select candidate taxa for lead discovery, 
and to make recommendations for policies regarding trad- 
itional use and conservation priorities. Phylogenetic pre- 
diction of chemical diversity and biological activity may 
provide an evolutionary based tool alone or in combin- 
ation with other recently developed tools for in silico data 
mining of natural products and their bioactivity. 

Methods 

Taxon sampling 

Specimens were collected in their natural habitat or 
obtained from botanical gardens or specialist nurseries. 



Sampling included 108 (over 10%) of circa 850 species 
in Amaryllidaceae subfamily Amaryllidoideae [69] with 
Agapanthus campanulatus L. used as outgroup 
(Additional file 2). Sampling represents 43 of circa 60 
genera and all currently recognized tribes except 
Griffineae Ravenna [41,42,56]. Samples from tribes 
Galantheae and Haemantheae were partly retrieved from 
previous studies [21,51]. The same accessions of plant 
material were used for both molecular, chemical, and 
bioactivity analysis to minimize effects of intraspecific 
and ecological variation. 

Phylogenetic analyses 

DNA was extracted using the Qiagen DNeasy kit 
(Qiagen, Copenhagen, Denmark) from 20 mg of dried 
leaf fragments. Amplification and sequencing of the nu- 
clear encoded ITS and plastid encoded matK and trnLF 
regions followed Larsen et al, [51]. Amplification and se- 
quencing of the mitochondrial nadl region followed 
Cuenca et al [70]. Primers used for amplification and 
sequencing are listed in Additional file 3. Both strands 
were sequenced for each region for all taxa whenever 
possible. Sequences were edited and assembled using Se- 
quencer 4.8™ software (Gene Codes, Ann Arbor, MI, 
USA). All sequences are deposited in GenBank and ac- 
cession numbers JX464256- JX464610 are listed in 
Additional file 2. Sequences were aligned using default 
options in MUSCLE [71] as implemented in the software 
SeaView [72]. 

Phylogenetic analyses were conducted using both par- 
simony and Bayesian inference. Most parsimonious trees 
(MP) were obtained with PAUP v. 4.0bl0 [73] using 
1,000 replicates of random taxon addition sequence and 
TBR branch swapping saving multiple trees. All charac- 
ters were included in the analyses and gaps were treated 
as missing data. We analysed the four regions separately 
to identify strongly supported phylogenetic conflicts 
among the regions, prior to performing a combined total 
evidence analysis. By using total evidence the explana- 
tory and descriptive power of the data is maximized 
[74]. Bootstrap analyses [75] of the four individual data- 
sets and the combined dataset were carried out using 
1,000 replicates. Bayesian analysis of the combined data- 
set was performed with MrBayes 3.1.2 [76]. We first 
selected the best fitting model (GTR + 1 + G; Parameters: 
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Iset NST = 6 RATES = gamma) of molecular evolution 
using the Akaike criterion (AIC) in Modeltest v. 3.8 [77]. 
The analysis was performed with 1,000,000 generations 
on four Monte Carlo Markov chains. The average stand- 
ard deviation of the split frequencies was 0.01 after 
232,000 generations and < 0.005 after 1 million genera- 
tions corresponding to an effective sample size of 115 
using the software Tracer v. 1.5.0 [78]. The first 2,500 
(25%) trees of low posterior probability were deleted and 
all remaining trees were imported into PAUP. A majority 
rule consensus tree was produced showing the posterior 
probabilities (PP) of all observed bi-partitions. We also 
performed a partitioned analysis allowing different mod- 
els for the three genomes. However, in consideration of 
the limited information present in our plastid and mito- 
chondrial datasets (Table 1), partition rich strategies are 
not always the best ones and in some cases less complex 
strategies have performed better [79,80]. Although the 
Bayesian MCMC approach is good at handling complex 
models, there is a risk of over-parameterization, which 
can result in problems with convergence and excessive 
variance in parameter estimates [81]. 

Chemical diversity 

Alkaloids were extracted from 300 mg dried bulb scales 
using 0.1% H2SO4 and clean-up on ion-exchange solid 
phase columns as described by Larsen et al [51]. All 
extracts were concentrated under vacuum until dryness 
and re-dissolved to a standard concentration of 
5 mg ml'^ in MeOH. Alkaloid profiles were obtained by 
gas chromatography-mass spectrometry (GC-MS) as 
described by Larsen et al [51] using a method developed 
by Berkov et al [82]. Alkaloids were identified to type by 
comparison with the NIST 08 Mass Spectral Search Pro- 
gram, version 2.0 (NIST, Gaithersburg, Maryland) and 
with published spectral data. Alkaloid structures were 
scored to one of eighteen types (Figure 1) proposed by 
Jin [45,46] based on hypothetical biosynthetic pathways 
[51]. Only nine of the eighteen alkaloid types were 
recorded in the present study (Figure 1). Each type show 
characteristic fragmentation patterns in the MS -spectra 
[83]. In most of the cases, the database proposals with 
highest similarity could therefore be used to score the 
candidate structure indirectly to one of the types. Candi- 
date structures were excluded from the profile if they 
could not be scored unambiguously to types. 

In vitro biological activity 

AChE inhibition and SERT affinity of the standardized al- 
kaloid extracts were tested using published methods [21]. 
AChE activity was conducted using isolated acetylcholin- 
esterase {Electrophorus electricus, Sigma, Germany) and 
SERT activity using homogenates of whole rat brains except 
cerebellum. Galanthamine and fluoxetine hydrochloride 



were used as positive standards in the AChE and SERT 
assays, respectively. Data were analysed with the software 
package GraFit 5 (Erithacus Software Ltd.). Activity values 
are means of three individual determinations each per- 
formed in triplicate. In an initial screening, AChE inhibition 
was defined as minimum 50% inhibition at a concentration 
of 1.0 (ig ml'^. Subsequently IC50 values were determined 
for all extracts deemed active according the initial screen- 
ing. IC50 values < 50 (ig ml"^ was considered active for the 
analysis. SERT activity was defined as more than 85% bind- 
ing of extract at 5 mg ml'^. Subsequently IC50 values were 
determined for all extracts deemed active according the ini- 
tial screening. IC50 values < 50 (ig ml'^ was considered ac- 
tive for the analysis. These activity levels were designed to 
reflect the observed level of activity in the present study 
and do not necessarily reflect levels of pharmacological 
relevance, but within the range of proposed ecological rele- 
vance [84]. SERT activity data were not determined for 
eight species of Narcissus and these samples were pruned 
from the phylogenetic trees in the correlation tests. 

Phylogenetic signal 

We assessed the relationship of phylogeny to chemical 
diversity and biological activity by calculating the phylo- 
genetic signal present for individual alkaloid types and 
types of biological activity. Each alkaloid type was coded 
as being either present (1) or absent (0) for each species. 
Likewise, for biological activity, species were scored for 
the presence or absence of AChE inhibition or SERT 
binding activity. Two of the alkaloid traits (belladine and 
cherylline) are found only in one species each, rendering 
calculation of phylogenetic signal for these traits mean- 
ingless, so they were not included in this part of the 
analysis. 

To quantify phylogenetic signal we used the recently 
developed D metric [57], specifically developed to deal 
with discrete binary traits. D is calculated as foUows: 

{Sr - Sb) 

where Sobs is the observed number of changes in the bin- 
ary trait (here, a chemical component) across the ultra- 
metric phylogeny, 5r is the mean number of changes 
generated from 1000 random permutations of the spe- 
cies values at the tips of the phylogeny, and s^ is the 
mean number of changes generated from 1000 simula- 
tions of the evolution for the character by a Brownian 
motion model of evolution with likelihood of change 
being specified as that which produces the same number 
of tip species with each character state as the observed 
pattern. A D value of 1 {Sobs = ^r) indicates that the trait 
has evolved in a way that cannot be distinguished from a 
random manner (i.e., no phylogenetic signal), whilst a D 
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value of 0 {Sobs = ) indicates that the trait has evolved 
in a phylogenetically highly correlated manner. Estima- 
tion of whether D differs significantly from 1 or 0 is 
achieved by evaluating where the observed number of 
changes {Sobs) fits within the distribution of the 1000 
generated Sy. and 5^ values respectively. Thus if 95% or 
more values of Sr are greater than Sobs then P 
(£) = !)< 0.05 and the trait is significantly more phylo- 
genetically structured than random expectation. Calcula- 
tion of D was carried out using the packages caper [85] 
and ape [86] in the R v2.14.0 framework [87]. 

We also quantified the relationship between overall 
chemical profile and phylogeny following an approach 
used to study the evolution of pheromone chemical diver- 
sity [57,58]. We constructed pairwise matrices; one of 
ultrametric phylogenetic distances (summed branch 
lengths) between species, and the other of chemical differ- 
ence calculated as the binary squared Euclidean distance 
(i.e., the total number of alkaloid types that are absent in 
one taxon but present in another and vice versa). In 
addition to using all included species as terminal taxa, we 
also pruned the phylogenetic tree to genera and compared 
the resulting distance matrix to summed chemical profiles 
for each genus. In the case of polyphyletic genera both 
clades were retained. The correlation between phylogen- 
etic distance and chemical difference was calculated using 
Mantel tests, with rows and columns of the distance 
matrix being randomly perturbed and the correlation co- 
efficient recalculated 999 times to generate a null fre- 
quency distribution. These tests were performed using the 
program GenAlEx [88]. 
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